{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Image Classification from scratch with TPUs on Cloud ML Engine using ResNet\n",
    "\n",
    "This notebook demonstrates how to do image classification from scratch on a flowers dataset using TPUs and the resnet trainer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "PROJECT = 'cloud-training-demos' # REPLACE WITH YOUR PROJECT ID\n",
    "BUCKET = 'cloud-training-demos-ml' # REPLACE WITH YOUR BUCKET NAME\n",
    "REGION = 'us-central1' # REPLACE WITH YOUR BUCKET REGION e.g. us-central1\n",
    "\n",
    "# do not change these\n",
    "os.environ['PROJECT'] = PROJECT\n",
    "os.environ['BUCKET'] = BUCKET\n",
    "os.environ['REGION'] = REGION\n",
    "os.environ['TFVERSION'] = '1.9'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "gcloud config set project $PROJECT\n",
    "gcloud config set compute/region $REGION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Convert JPEG images to TensorFlow Records\n",
    "\n",
    "My dataset consists of JPEG images in Google Cloud Storage. I have two CSV files that are formatted as follows:\n",
    "   image-name, category\n",
    "\n",
    "Instead of reading the images from JPEG each time, we'll convert the JPEG data and store it as TF Records.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "gcloud storage cat gs://cloud-ml-data/img/flower_photos/train_set.csv | head -5 > /tmp/input.csv\n",
    "cat /tmp/input.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "gcloud storage cat gs://cloud-ml-data/img/flower_photos/train_set.csv  | sed 's/,/ /g' | awk '{print $2}' | sort | uniq > /tmp/labels.txt\n",
    "cat /tmp/labels.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clone the TPU repo\n",
    "\n",
    "Let's git clone the repo and get the preprocessing and model files. The model code has imports of the form:\n",
    "<pre>\n",
    "import resnet_model as model_lib\n",
    "</pre>\n",
    "We will need to change this to:\n",
    "<pre>\n",
    "from . import resnet_model as model_lib\n",
    "</pre>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile copy_resnet_files.sh\n",
    "#!/bin/bash\n",
    "rm -rf tpu\n",
    "git clone https://github.com/tensorflow/tpu\n",
    "cd tpu\n",
    "TFVERSION=$1\n",
    "echo \"Switching to version r$TFVERSION\"\n",
    "git checkout r$TFVERSION\n",
    "cd ..\n",
    "  \n",
    "MODELCODE=tpu/models/official/resnet\n",
    "OUTDIR=mymodel\n",
    "rm -rf $OUTDIR\n",
    "\n",
    "# preprocessing\n",
    "cp -r imgclass $OUTDIR   # brings in setup.py and __init__.py\n",
    "cp tpu/tools/datasets/jpeg_to_tf_record.py $OUTDIR/trainer/preprocess.py\n",
    "\n",
    "# model: fix imports\n",
    "for FILE in $(ls -p $MODELCODE | grep -v /); do\n",
    "    CMD=\"cat $MODELCODE/$FILE \"\n",
    "    for f2 in $(ls -p $MODELCODE | grep -v /); do\n",
    "        MODULE=`echo $f2 | sed 's/.py//g'`\n",
    "        CMD=\"$CMD | sed 's/^import ${MODULE}/from . import ${MODULE}/g' \"\n",
    "    done\n",
    "    CMD=\"$CMD > $OUTDIR/trainer/$FILE\"\n",
    "    eval $CMD\n",
    "done\n",
    "find $OUTDIR\n",
    "echo \"Finished copying files into $OUTDIR\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!bash ./copy_resnet_files.sh $TFVERSION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Enable TPU service account\n",
    "\n",
    "Allow Cloud ML Engine to access the TPU and bill to your project"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile enable_tpu_mlengine.sh\n",
    "SVC_ACCOUNT=$(curl -H \"Authorization: Bearer $(gcloud auth print-access-token)\"  \\\n",
    "    https://ml.googleapis.com/v1/projects/${PROJECT}:getConfig \\\n",
    "              | grep tpuServiceAccount | tr '\"' ' ' | awk '{print $3}' )\n",
    "echo \"Enabling TPU service account $SVC_ACCOUNT to act as Cloud ML Service Agent\"\n",
    "gcloud projects add-iam-policy-binding $PROJECT \\\n",
    "    --member serviceAccount:$SVC_ACCOUNT --role roles/ml.serviceAgent\n",
    "echo \"Done\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "!bash ./enable_tpu_mlengine.sh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Try preprocessing locally"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "export PYTHONPATH=${PYTHONPATH}:${PWD}/mymodel\n",
    "  \n",
    "rm -rf /tmp/out\n",
    "python -m trainer.preprocess \\\n",
    "       --train_csv /tmp/input.csv \\\n",
    "       --validation_csv /tmp/input.csv \\\n",
    "       --labels_file /tmp/labels.txt \\\n",
    "       --project_id $PROJECT \\\n",
    "       --output_dir /tmp/out --runner=DirectRunner"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls -l /tmp/out"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now run it over full training and evaluation datasets.  This will happen in Cloud Dataflow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "export PYTHONPATH=${PYTHONPATH}:${PWD}/mymodel\n",
    "gcloud storage rm --recursive --continue-on-error gs://${BUCKET}/tpu/resnet/data\n",
    "python -m trainer.preprocess \\\n",
    "       --train_csv gs://cloud-ml-data/img/flower_photos/train_set.csv \\\n",
    "       --validation_csv gs://cloud-ml-data/img/flower_photos/eval_set.csv \\\n",
    "       --labels_file /tmp/labels.txt \\\n",
    "       --project_id $PROJECT \\\n",
    "       --output_dir gs://${BUCKET}/tpu/resnet/data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The above preprocessing step will take <b>15-20 minutes</b>. Wait for the job to finish before you proceed. Navigate to [Cloud Dataflow section of GCP web console](https://console.cloud.google.com/dataflow) to monitor job progress. You will see something like this <img src=\"dataflow.png\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternately, you can simply copy my already preprocessed files and proceed to the next step:\n",
    "<pre>\n",
    "gcloud storage cp gs://cloud-training-demos/tpu/resnet/data/* gs://${BUCKET}/tpu/resnet/copied_data\n",
    "</pre>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "gcloud storage ls gs://${BUCKET}/tpu/resnet/data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train on the Cloud"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "echo -n \"--num_train_images=$(gcloud storage cat gs://cloud-ml-data/img/flower_photos/train_set.csv | wc -l)  \"\n",
    "echo -n \"--num_eval_images=$(gcloud storage cat gs://cloud-ml-data/img/flower_photos/eval_set.csv | wc -l)  \"\n",
    "echo \"--num_label_classes=$(cat /tmp/labels.txt | wc -l)\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "TOPDIR=gs://${BUCKET}/tpu/resnet\n",
    "OUTDIR=${TOPDIR}/trained\n",
    "JOBNAME=imgclass_$(date -u +%y%m%d_%H%M%S)\n",
    "echo $OUTDIR $REGION $JOBNAME\n",
    "gcloud storage rm --recursive --continue-on-error $OUTDIR  # Comment out this line to continue training from the last time\n",
    "gcloud ml-engine jobs submit training $JOBNAME \\\n",
    "  --region=$REGION \\\n",
    "  --module-name=trainer.resnet_main \\\n",
    "  --package-path=$(pwd)/mymodel/trainer \\\n",
    "  --job-dir=$OUTDIR \\\n",
    "  --staging-bucket=gs://$BUCKET \\\n",
    "  --scale-tier=BASIC_TPU \\\n",
    "  --runtime-version=$TFVERSION --python-version=3.5 \\\n",
    "  -- \\\n",
    "  --data_dir=${TOPDIR}/data \\\n",
    "  --model_dir=${OUTDIR} \\\n",
    "  --resnet_depth=18 \\\n",
    "  --train_batch_size=128 --eval_batch_size=32 --skip_host_call=True \\\n",
    "  --steps_per_eval=250 --train_steps=1000 \\\n",
    "  --num_train_images=3300  --num_eval_images=370  --num_label_classes=5 \\\n",
    "  --export_dir=${OUTDIR}/export"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The above training job will take 15-20 minutes. \n",
    "Wait for the job to finish before you proceed. \n",
    "Navigate to [Cloud ML Engine section of GCP web console](https://console.cloud.google.com/mlengine) \n",
    "to monitor job progress.\n",
    "\n",
    "The model should finish with a 80-83% accuracy (results will vary):\n",
    "```\n",
    "Eval results: {'global_step': 1000, 'loss': 0.7359053, 'top_1_accuracy': 0.82954544, 'top_5_accuracy': 1.0}\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "gcloud storage ls gs://${BUCKET}/tpu/resnet/trained/export/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can look at the training charts with TensorBoard:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "OUTDIR = 'gs://{}/tpu/resnet/trained/'.format(BUCKET)\n",
    "from google.datalab.ml import TensorBoard\n",
    "TensorBoard().start(OUTDIR)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "TensorBoard().stop(11531)\n",
    "print(\"Stopped Tensorboard\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These were the charts I got (I set smoothing to be zero):\n",
    "<img src=\"resnet_traineval.png\" height=\"50\"/>\n",
    "As you can see, the final blue dot (eval) is quite close to the lowest training loss, indicating that the model hasn't overfit.  The top_1 accuracy on the evaluation dataset, however, is 80% which isn't that great. More data would help.\n",
    "<img src=\"resnet_accuracy.png\" height=\"50\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deploying and predicting with model\n",
    "\n",
    "Deploy the model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "MODEL_NAME=\"flowers\"\n",
    "MODEL_VERSION=resnet\n",
    "MODEL_LOCATION=$(gcloud storage ls gs://${BUCKET}/tpu/resnet/trained/export/ | tail -1)\n",
    "echo \"Deleting/deploying $MODEL_NAME $MODEL_VERSION from $MODEL_LOCATION ... this will take a few minutes\"\n",
    "\n",
    "# comment/uncomment the appropriate line to run. The first time around, you will need only the two create calls\n",
    "# But during development, you might need to replace a version by deleting the version and creating it again\n",
    "\n",
    "#gcloud ml-engine versions delete --quiet ${MODEL_VERSION} --model ${MODEL_NAME}\n",
    "#gcloud ml-engine models delete ${MODEL_NAME}\n",
    "gcloud ml-engine models create ${MODEL_NAME} --regions $REGION\n",
    "gcloud ml-engine versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION} --runtime-version=$TFVERSION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use saved_model_cli to find out what inputs the model expects:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "saved_model_cli show --dir $(gcloud storage ls gs://${BUCKET}/tpu/resnet/trained/export/ | tail -1) --tag_set serve --signature_def serving_default"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the model expects image_bytes.  This is typically base64 encoded"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To predict with the model, let's take one of the example images that is available on Google Cloud Storage <img src=\"http://storage.googleapis.com/cloud-ml-data/img/flower_photos/sunflowers/1022552002_2b93faf9e7_n.jpg\" /> and convert it to a base64-encoded array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import base64, sys, json\n",
    "import tensorflow as tf\n",
    "import io\n",
    "with tf.gfile.GFile('gs://cloud-ml-data/img/flower_photos/sunflowers/1022552002_2b93faf9e7_n.jpg', 'rb') as ifp:\n",
    "  with io.open('test.json', 'w') as ofp:\n",
    "    image_data = ifp.read()\n",
    "    img = base64.b64encode(image_data).decode('utf-8')\n",
    "    json.dump({\"image_bytes\": {\"b64\": img}}, ofp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls -l test.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Send it to the prediction service"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "gcloud ml-engine predict --model=flowers --version=resnet --json-instances=./test.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What does CLASS no. 3 correspond to? (remember that classes is 0-based)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "head -4 /tmp/labels.txt | tail -1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's how you would invoke those predictions without using gcloud"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from googleapiclient import discovery\n",
    "from oauth2client.client import GoogleCredentials\n",
    "import base64, sys, json\n",
    "import tensorflow as tf\n",
    "\n",
    "with tf.gfile.GFile('gs://cloud-ml-data/img/flower_photos/sunflowers/1022552002_2b93faf9e7_n.jpg', 'rb') as ifp:\n",
    "  credentials = GoogleCredentials.get_application_default()\n",
    "  api = discovery.build('ml', 'v1', credentials=credentials,\n",
    "            discoveryServiceUrl='https://storage.googleapis.com/cloud-ml/discovery/ml_v1_discovery.json')\n",
    "  \n",
    "  request_data = {'instances':\n",
    "  [\n",
    "      {\"image_bytes\": {\"b64\": base64.b64encode(ifp.read()).decode('utf-8')}}\n",
    "  ]}\n",
    "\n",
    "  parent = 'projects/%s/models/%s/versions/%s' % (PROJECT, 'flowers', 'resnet')\n",
    "  response = api.projects().predict(body=request_data, name=parent).execute()\n",
    "  print(\"response={0}\".format(response))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<pre>\n",
    "# Copyright 2018 Google Inc. All Rights Reserved.\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#      http://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License.\n",
    "</pre>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
